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In the United States the preferred method of obtaining dietary 
intake data is the 24-hour dietary recall, yet the measure of most 
'^ / interest is usual or long-term average daily intake, which is impos- 

sible to measure. Thus, usual dietary intake is assessed with con- 
siderable measurement error. Also, diet represents numerous foods, 
^ ' nutrients and other components, each of which have distinctive at- 

00 I tributes. Sometimes, it is useful to examine intake of these com- 

^^D • ponents separately, but increasingly nutritionists are interested in 

QO ' exploring them collectively to capture overall dietary patterns. Con- 

~~^^ , sumption of these components varies widely: some are consumed daily 

f"~>. ' by almost everyone on every day, while others are episodically con- 

c 2 ^ I sumed so that 24-hour recall data are zero-inflated. In addition, they 

are often correlated with each other. Finally, it is often preferable to 
analyze the amount of a dietary component relative to the amount 
of energy (calories) in a diet because dietary recommendations of- 
ten vary with energy level. The quest to understand overall dietary 
^\ ' patterns of usual intake has to this point reached a standstill. There 

*~i , are no statistical methods or models available to model such com- 

plex multivariate data with its measurement error and zero inflation. 
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This paper proposes the first such model, and it proposes the first 
workable solution to fit such a model. After describing the model, 
we use survey-weighted MCMC computations to fit the model, with 
uncertainty estimation coming from balanced repeated replication. 
The methodology is illustrated through an application to estimating 
the population distribution of the Healthy Eating Index-2005 (HEI- 
2005), a multi-component dietary quality index involving ratios of 
interrelated dietary components to energy, among children aged 2-8 
in the United States. We pose a number of interesting questions about 
the HEI-2005 and provide answers that were not previously within 
the realm of possibility, and we indicate ways that our approach can 
be used to answer other questions of importance to nutritional science 
and public health. 

1. Introduction. This paper presents statistical models and methodol- 
ogy to overcome a major stumbling block in the field of dietary assessment. 
More nutritional background is provided in Section 2: a summary of the key 
conceptual issues follows: 

• Nutritional surveys conducted in the United States typically use 24-hour 
(24 h) dietary recalls to obtain intake data, that is, an assessment of what 
was consumed in the past 24 hours. 

• Because dietary recommendations are intended to be met over time, nu- 
tritionists are interested in "usual" or long-term average daily intake. 

• Dietary intake is thus assessed with considerable measurement error. 

• Consumption patterns of dietary components vary widely; some are con- 
sumed daily by almost everyone, while others are episodically consumed 
so that 24-hour recall data are zero-inflated. Further, these components 
are correlated with one another. 

• Nutritionists are interested in dietary components collectively to capture 
patterns of usual dietary intake, and thus need multivariate models for 
usual intake. 

• These multivariate models for usual intakes, taking into account episodi- 
cally consumed foods, do not exist, nor do methods exist for fitting them. 

One way to capture dietary patterns is by scores, although our work is 
not limited to scores. The Healthy Eating Index-2005 (HEI-2005), described 
in detail in Section 2, is a scoring system based on a priori knowledge of di- 
etary recommendations, and is on a scale of 0-100. Ideally, it consists of the 
usual intake of 6 episodically consumed and thus 24 h-zero inflated foods, 
6 daily-consumed dietary components, adjusts these for energy (caloric) in- 
take, and gives a score to each component. The total score is the sum of the 
individual component scores. Higher scores indicate greater compliance with 
dietary guidelines and, therefore, a healthier diet. Here are a few questions 
that nutritionists have not been able to answer, and that our approach can 
address: 
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• What is the distribution of the HEI-2005 total score, and what % of 
Americans are eating a healthier diet defined, for example, by a total 
score exceeding 80? 

• What is the correlation between the individual score on each dietary com- 
ponent and the scores of all other dietary components? 

• Among those whose total HEI-2005 score is >50 or <50, what is the dis- 
tribution of usual intake of whole grains, whole fruits, dark green and 
orange vegetables and legumes (DOL) and calories from solid fats, alco- 
holic beverages and added sugars (SoFAAS)? 

• What % of Americans exceed the median score on all 12 HEI-2005 com- 
ponents? 

In this paper, to answer public health questions such as these that can 
have policy implications, we build a novel multivariate measurement error 
model for estimating the distributions of usual intakes, one that accounts 
for measurement error and zero-inflation, and has a special structure asso- 
ciated with the zero-inflation. Previous attempts to fit even simple versions 
of this model, using nonlinear mixed effects software, failed because of the 
complexity and dimensionality of the model. We use survey-weighted Monte 
Carlo computations to fit the model with uncertainty estimation coming 
from balanced repeated replication. The methodology is illustrated using 
the HEI-2005 to assess the diets of children aged 2-8 in the United States. 
This work represents the first analysis of joint distributions of usual intakes 
for multiple food groups and nutrients. 

The paper is outlined as follows. In Section 2 we give the background 
for the data we observe. In particular, we provide more information about 
the HEI-2005. Section 3 describes our model which is a highly nonlinear, 
zero-inflated, repeated measures model with multiple latent variables. The 
model also has a patterned covariance matrix with structural zeros and ones. 
We derive a parameterization that allows estimated covariance matrices to 
be actual covariance matrices. We also define technically what we mean by 
usual intake, and illustrate the use of simulation methods used to answer 
the questions posed above, as well as many others. 

Section 4 describes our estimation procedure. Previous attempts using 
nonlinear mixed effects models to estimate the distribution of episodically 
consumed food groups [Tooze et al. (2006); Kipnis et al. (2009)] do not 
work here because of the high dimensionality of the problem. We instead 
develop a Monte Carlo strategy based on the idea of Gibbs sampling; al- 
though because of sampling weights, we treat the method as a frequentist 
(non-Bayesian) one. This section describes some of the basics of the method- 
ology; the full technical details of implementation are given in the Appendix. 

Section 5 describes the analysis of the HEI-2005 components using the 
2001-2004 National Health and Nutrition Examination Survey (NHANES) 
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for children ages 2-8. Important contextual points arise because of the nature 
of the data. For example, if whole grains are consumed, then necessarily total 
grains are consumed with probability one, a restriction that a naive use of 
our model cannot handle. We develop a simple novel device to uncouple 
consumption variables that are tightly linked in this way. Finally, in this 
section we provide the first answers to the four questions we have posed. In 
Section 6 we discuss various additional aspects of the problem and the data 
analysis. Concluding remarks and a policy application are given in Section 7. 
There are a number of general reviews of the measurement error field 
[Fuller (1987); Gustafson (2004); Carroll et al. (2006); Buonaccorsi (2010)]. 
Recent papers that focus on estimating the density function of a univariate 
continuous random variable subject to measurement error include Delaigle 
(2008), Delaigle and Hah (2008, 2011), Delaigle and Meister (2008), Delaigle, 
Hall and Meister (2008), Staudenmayer, Ruppert and Buonaccorsi (2008) 
and Wand (1998). The field of measurement error in regression continues 
to expand rapidly, with some recent contributions including Kiichenhoff, 
Mwahh and Lesaffre (2006), Guolo (2008), Liang et al. (2008), Messer and 
Natarajan (2008) and Natarajan (2009). There is also a large statistical 
literature on measurement error as it relates to public health nutrition: some 
recent papers relevant to our work include Carriquiry (1999, 2003), Ferrari 
et al. (2009), Fraser and Shavlik (2004), Kott et al. (2009), Nusser et al. 
(1996), Nusser, Fuller and Guenther (1997), Prentice (1996, 2003), Tooze, 
Grunwald and Jones (2002) and Tooze et al. (2006). 

2. Data and the HEI-2005 scores. Here we give more detail about the 
nutrition context that motivates this work. 

In surveys conducted in the United States, the preferred method of ob- 
taining intake data is the 24-hour dietary recall because it limits respondent 
burden and facilitates accurate reporting; yet the measure of greatest in- 
terest is "usual" or long-term average daily intake. Thus, dietary intake is 
assessed with considerable measurement error. Also, diets are comprised of 
numerous foods, nutrients and other components, each of which may have 
distinctive attributes and effects on nutritional health. Sometimes, it is use- 
ful to examine intake of these components separately, but increasingly nu- 
tritionists are interested in exploring them collectively to capture patterns 
of dietary intake. Consumption patterns of these components vary widely; 
some are consumed daily by almost everyone, while others are episodically 
consumed so that 24-hour recall data are zero-inflated. In addition, these 
various components are often correlated with one another. Finally, it is of- 
ten preferable to analyze the amount of a dietary component relative to the 
amount of energy (calories) in a diet because dietary recommendations often 
vary with energy level, and this approach provides a way of standardizing 
dietary assessments. 
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One of the US Department of Agriculture's (USDA's) strategic objec- 
tives is "to promote healthy diets" and it has developed an associated 
performance measure, the Healthy Eating Index-2005 (HEI-2005, http:// 
www.cnpp.usda.gov/HealthyEatingIndex.htm). The HEI-2005 is based on 
the key recommendations of the 2005 Dietary Guidelines for Americans 
(http://www.health.gov/dietaryguidelines/dga2005/document/default.htm). 
The index includes ratios of interrelated dietary components to energy. The 
HEI-2005 comprises 12 distinct component scores and a total summary score. 
See Table 1 for a list of these components and the standards for scoring, and 
see Guenther, Reedy and Krebs-Smith (2008) for details. Intakes of each 



Table 1 

Description of the HEI-2005 scoring system. Except for saturated fat and SoFAAS, 

density is obtained by multiplying usual intake by 1000 and dividing by usual intake of 

kilocalories 



Component 



Units 



HEI-2005 score calculation 



Total fruit 


cups 


Whole fruit 


cups 


Total vegetables 


cups 


DOL 


cups 


Total grains 


ounces 


Whole grains 


ounces 


Milk 


cups 


Meat and beans 


ounces 


Oil 


grams 


Saturated fat 


%of 



Sodium 



SoFAAS 



energy 



milligrams 



%of 
energy 



min(5,5 x (density/0.8)) 

min(5,5 x (density/0.4)) 

min(5,5 x (density/1.1)) 

min(5,5 x (density/0.4)) 

min (5, 5 x (density/3)) 

min(5, 5 x (density/1.5)) 

min (10, 10 x (density/1.3)) 

min (10, 10 x (density/2.5)) 

min (10, 10 x (density/12)) 

if density > 15 score = 

else if density < 7 score = 10 

else if density > 10 score = 8 — 

else, score = 10 — (2 x (density 

if density > 2000 score = 

else if density < 700 score = 10 

else if density > 1100 

score = 8 - {8 X (density - 1100)/(2000 - 1100)} 

else score = 10 - {2 x (density - 700)/(1100 - 700)} 

if density > 50 score = 

else if density < 20 score = 20 

else score = 20 - {20 x (density - 20)/(50 - 20)} 



X (density- 10) /5) 
7)/3) 



For saturated fat, density is 9 x 100 usual saturated fat (grams) divided by usual calo- 
ries, that is, the percentage of usual calories coming from usual saturated fat intake. For 
SoFAAS, the density is the percentage of usual intake that comes from usual intake of 
calories, that is, the division of usual intake of SoFAAS by usual intake of calories. Here, 
"DOL" is dark green and orange vegetables and legumes. Also, "SoFAAS" is calories from 
solid fats, alcoholic beverages and added sugars. The total HEI-2005 score is the sum of 
the individual component scores. 
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food or nutrient, represented by one of the 12 components, are expressed as 
a ratio to energy intake, assessed, and ascribed a score. 

The HEI-2005 is used to evaluate the diets of Americans to assess compH- 
ance with the 2005 Dietary Guidehnes, yet use of the HEI-2005 is hmited by 
the chahenges described above. Until recently, there have been no solutions 
to these challenges, so published evaluations have been limited to analyses 
of mean scores for the population and various subgroups. Freedman et al. 
(2010) have described a method of estimating the population distribution 
of a single component of HEI-2005, and the prevalence of high or low scores 
on that component; but there has been to date no satisfactory way to de- 
termine the prevalence of high or low total HEI-2005 scores, considering all 
of its interrelated components simultaneously. In addition, answers to the 
complex questions posed in the Introduction remain unavailable. This paper 
aims to provide a means to do these crucial evaluations. 

The 12 HEI-2005 components represent 6 episodically consumed food 
groups (total fruit, whole fruit, total vegetables, dark green and orange veg- 
etables and legumes or DOL, whole grains and milk), 3 daily-consumed food 
groups (total grains, meat and beans and oils) and 3 other daily-consumed 
dietary components (saturated fat; sodium; and calories from solid fats, al- 
coholic beverages and added sugars, or SoFAAS). The classification of food 
groups as "episodically" and "daily" consumed is based on the number of 
individuals who report them on 24 h recalls. If there are only a few zeros for 
a component, we treat that as a daily-consumed food, and replace all zeros 
with 1/2 the minimum value of the nonzeros for that food. However, the 
crucial statistical aspect of the data is that six of the food groups are zero- 
inflated. The percentages of reported nonconsumption of total fruit, whole 
fruit, whole grains, total vegetables, DOL and milk on any single day are 
17%, 40%, 42%, 3%, 50% and 12%, respectively. 

We are interested in the usual intake of foods for children aged 2-8. The 
data available to us, described in more detail in Section 5, came from the 
National Health and Nutrition Examination Survey, 2001-2004 (NHANES). 
The data used here consisted of n = 2,638 children, each of whom had a sur- 
vey weight Wi for i = 1, . . . ,n. In addition, one or two 24 h dietary recalls 
were available for each individual. Along with the dietary variables, there 
are covariates such as age, gender, ethnicity, family income and dummy vari- 
ables that indicate a weekday or a weekend day, and whether the recall was 
the first or second reported for that individual. 

Using the 24 h recall data reported, for each of the episodically consumed 
food groups, two variables are defined: (a) whether a food from that group 
was consumed; and (b) the amount of the food that was reported on the 24 h 
recall. For the 6 daily-consumed food groups and nutrients, only one variable 
indicating the consumption amount is defined. In addition, the amount of 
energy that is calculated from the 24 h recall is of interest. The number of 
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dietary variables for each 24 h recall is thus 12 + 6 + 1 = 19. The observed 
data are Yijk for the ith person, the jth variable and the A;th replicate, 
J = 1, . . . , 19 and k = l, . . . , rrii. In the data set, at most two 24 h recalls were 
observed, so that rrii < 2. Set Yn^ = {Yuk, ■ ■ ■ jYi^ig^f^)"^ , where 

• Yi^2i-i,k = Indicator of whether dietary component #i is consumed, with 
^=1,2,3,4,5,6. 

• Yi^2e,k = Amount of food ^i consumed. This equals zero, of course, if none 
of food i^£ is consumed, with ^ = 1,2, 3, 4, 5, 6. 

• ^,£+6,fc = Amount of nonepisodically consumed food or nutrient i^i, with 
^ = 7,8,9,10,11,12. 

• ^,i9,fc = Amount of energy consumed as reported by the 24 h recall. 

3. Model and methods. 

3.1. Basic model description. Our model is a generalization of work by 
Tooze et al. (2006) and Kipnis et al. (2009) for a single food and Kipnis et al. 
(2011) and Zhang et al. (2011) for a single food and nutrient. Observed data 
will be denoted as Y, and covariates in the model will be denoted as X . As 
is usual in measurement error problems, there will also be latent variables, 
which will be denoted by W. 

We use a probit threshold model. Each of the 6 episodically consumed 
foods will have 2 sets of latent variables, one for consumption and one 
for amount, while the 6 daily-consumed foods and nutrients as well as en- 
ergy will have 1 set of latent variables, for a total of 19. The latent ran- 
dom variables are Sijk and Uij, where {Un, . . . ,Ui^ig) = Normal(0, S„) and 
{enk, ■ ■ ■ ,£i,i9,k) = Normal(0, Se) are mutually independent. In this model, 
food i = 1,...,6 being consumed on day k is equivalent to observing the 
binary Yioe-ih, where 



Yi,2e-i,k — 1 
(3.1) 



Wi^2e-l,k = Xi,2e-l,kf^2e-l + Ui^2l-l + £i,2l-l,k > 0. 



If the food is consumed, we model the amount reported Yi^2i,k as 

[gtriYi^2e,k, ^e)\Yi^2£-l,k = 1] = Wi^2i,k 

(3.2) _ ^ 

— ^i,2l,kP2l + Ui^2e. + £i,2e,ki 

where gtriu, A) = \/^{g{y, X) — fi{X)}/a{X), g{y, A) is the usual Box-Cox trans- 
formation with transformation parameter A, and {/i(A), cr(A)} are the sample 
mean and standard deviation of g{y,\), computed from the nonzero food 
data. This standardization is simply a convenient device to improve the nu- 
merical performance of our algorithm without affecting the conclusions of 
our analysis. 
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The reported consumption of daily consumed foods or nutrients i = 
7, . . . , 12 is modeled as 

(3-3) gtr(^j/+6,fc, A^) = Wi/+Q^k = Xi,i+e,k/^i+Q + Ui/+6 + ei^e+6,k- 
Finally, energy is modeled as 

(3.4) gtT:{Yi,19,k,^13) = Wi^ig^k = X^igj^f^ig + Ui^ig + ej,l9,fc. 

As seen in (3.2)-(3.4), different transformations (Ai, . . . , A13) are allowed to 
be used for the different types of dietary components; see Section A. 12. 

In summary, there are latent variables Wik = {Wnk, ■ ■ ■, l^i,i9,fc)'^, latent 
random effects Ui = {Un, . . . , C/j^ig)"'", fixed effects (/3i, . . . ,/3i9), and design 
matrices {Xnk, . . .,Xi^ig^k)- Define Sik = (ejifc, . . ■,£i,i9,kV- The latent vari- 
able model is 



(3.5) 



Wijk = X^ji^(3j + Uij + Eijk, 



where Ui = Normal(0, S^) and Eik = Normal(0, S^) are mutually indepen- 
dent. 



3.2. Restriction on the covariance matrix. Two necessary restrictions are 
set on Eg. First, following Kipnis et al. (2009, 2011), £2,2^-1, fc and ei,2i,k, 
{£ = 1,...,6) are set to be independent. Second, in order to technically 
identify (32i-i and the distribution of Ui^2i-i (^ = 1, ... 1 6), we require that 
var(ej 2£-i,fc) = 1) because otherwise the marginal probability of consump- 
tion of dietary component ^i would be ${(Xj^2£-i fc/^2£-i + ^i,2£-i)/ 
var^/^(ej^2£-i,fc)}; and thus components of (3 and !)„ would be identified 
only up to the scale var^/^(ej^2£-i,fc). 

So that we can handle any number of episodically consumed dietary com- 
ponents and any number of daily consumed components, suppose that there 
are J episodically consumed dietary components, and K daily consumed 
dietary components, and in addition there is energy. Then the restrictions 
defined above lead to the covariance matrix 



/ 



S, 



1 





Sl3 


su 





S22 


S23 


S24 


Sl3 


S23 


1 





Sl4 


S24 





S44 



•Sl,2J+l 



S2,2J+1 



S3,2J+1 S4,2J+1 



(3.6) 



W 



2J+K+1 S2,2J+K+1 S3^2J+K+1 S4^2J+K+1 
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v... 



•Sl,2J+l 
S2,2J+1 
•S3,2J+1 
•S4,2J+1 

S2J+1,2J+1 

S2J+1,2J+K+1 



Sl,2J+K+l 
S2,2J+K+l 
S3,2J+K+1 
S4:,2J+K+l 

S2J+1,2J+K+1 

S2J+K+1,2J+K+1 



The difficulty with parameterizations of (3.6) is that the cehs that are not 
constrained to be or 1 cannot be left unconstrained, otherwise (3.6) need 
not be a covariance matrix, that is, positive semidefinite. 

We have developed an unconstrained parameterization that results in the 
structure (3.6). Consider an unconstrained lower triangular matrix V and 
define S^ = VV . This is positive semidefinite and therefore qualifies S^ as 
a proper covariance matrix. The form of V is 



/ 



V- 



vn 

^21 





■"22 



\ 



V2J+K+l,2J+K+l/ 



\V2J+K+1,1 'V2J+K+1,2 ■ 

To achieve the desired pattern (3.6), we derive the following four restrictions: 

wii = 1; 



■"21 



p=i 



qp 



^VgpVg+l^p 
P=l 



0; 
0; 



g = 3,5, . . . ,2J — 1; 



g = 3,5,...,2J-l. 



The third restriction can be ensured by the further parameterization 

U31 =risin(6'i); 
"32 =ricos(6'i); 



"33 


= v^ 


_r.2. 






■"2^+1,1 


= r<jsin(6li+(^_ 


1)2); 




"2g+l,p 


= rgCos(6li+(5_ 


ip) 


X 




p = 


2,...,2q 


-1; 





X COS 



/p_l + (g_l) 



2 sm 



ip+{q-l)2), 
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■y29+l,2g = rgCOs(6'i + (q_i)2) X ••• XCOs(6'g2); 



^^29+1,29+1 = yl-r2, 

where q = 2,3, . . . ,J — 1, |rj| < 1, t = 1, . . . ,J — 1, and \0s\ < vr, s = 1, . 
(J -1)2. 

Similarly, the fourth restriction can be further expressed by setting 

g-l q-l 



^^<?+l,<? = - X] ^<?P^9+l,p/'^9<? = - X] '^9P'^<?+l,p/ v/l - '^ 



2 

{g-l)/2' 
p=l p=l 



where g = 3, 5, . . . , 2J — 1. 

Note that |E,| = \V\^ = UliV'^' v^^ = H^ v\^^^ nSf+V < 0^(1 " 

3.3. The use of sampling weights. As described in the Appendix, we 
used the survey sample weights from NHANES both in the model fitting 
procedure and, after having fit the model, in estimating the distributions of 
usual intake. 

While not displayed here, we redid the model fitting calculations without 
weighting, because the covariates we use are major players in determining 
the sampling weights, hence, it is reasonable to believe that the model in 
Section 3 holds both in the sample and in the population. When we did this, 
the parameter estimates were essentially unchanged. 

Thus, we use the sampling weights only for estimation of the population 
distributions. We actually did this for the purpose of handling the cluster- 
ing in the sample design. For such a complex statistical procedure as ours, 
we knew we could not do theoretical standard errors, so we thought about 
the bootstrap, and realized that putting together a bootstrap for the com- 
plex survey would be nearly impossible. However, we already had developed 
a set of Balanced Repeated Replication (BRR) weights [Wolter (1995)]; see 
Section 5.7 for details. These BRR weights have the property that, in the 
frequentist survey sampling sense, they appropriately reflect the clustering 
in the standard error calculations. 

Of course, the use of sampling weights in the modeling provide unbiased 
estimates of the (super) population parameters of interest. In addition, the 
use of sampling weights in the distribution estimation provides an estimated 
distribution that is representative of the US population, not just the sample. 

3.4. Distribution of usual intake and the HEI-2005 scores. We assume 
here that estimates of S„, Sg and /?j for j = 1, . . . , 19 have been constructed; 
see Section 4. Here we discuss what we mean by usual intake for an individ- 
ual, how to estimate the distribution of usual intakes, how to convert usual 
intakes into HEI-2005 scores, and how to assess uncertainty. 
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Consider the first episodically consumed dietary component, a food group, 
with reporting being done on a weekend. Set ^ii,wkend and -'^i2,wkend to be 
the versions of Xnk and Xi2k where the dummy variable has the indicator 
of the weekend and that the recall is the first one. Following Kipnis et al. 
(2009), we define the usual intake for an individual on the weekend to be 
the expectation of the reported intake conditional on the person's random 
effects Ui. Let the {q,p) element of S^ be denoted as 'Se,g,p- As in Kipnis 
et al. (2009) define 

(3.7) gUv, A, ^e,,,p} = g^^Hv, A) + is,,,,,^^^|^. 

Detailed formulas for this are given in Appendix A. 11. Then, following the 
convention of Kipnis et al. (2009), the person's usual intake of the first 
episodically consumed dietary component on the weekend is defined as 

Similarly, let Xii^„kday and Xj2,wkday be as above but the dummy variable 
is appropriate for a weekday. Then the person's usual intake of the first 
episodically consumed food group on weekdays is defined as 

7il,wkday = '^(-'^il.wkdayA + t^il)5tr (-'^i2,wkday/32 + ^J2, Al, S£,2,2)- 

Finally, the usual intake of the first episodically consumed food for the in- 
dividual is 

Til = (4Tji^„kday + 3rji^wkcnd)/7, 

since Fridays, Saturdays and Sundays are considered to be weekend days. 
Usual intake for the other episodically consumed food groups is defined 
similarly. 

A person's usual intake of a daily-consumed food group/nutrient and en- 
ergy on the original scale is defined similarly. Consider, for example, energy, 
which is the 13th dietary component and the 19th set of terms in the model. 
Let Xj^ig .fvkond and Xj ig .^vkday be the versions of Xj^ig ^ where the dummy 
variable has the indicator of the weekend or weekday, respectively, and that 
the recall is the first one. Then 

^i,13,wkend = 9tT{^i,19,wkcndl^W + f^i,19, A13, Sg^ig^ig); 
rj,13,wkday = 9tr(-'^j,19,wkday/^19 + f^j,19> A13, Sg^ig^ig); 
^4,13 = (4Tj^i3^wkday + 3Tj^i3^„kcnd)/7. 

Similar formulae are used for the other daily-consumed foods and nutrients. 
Finally, the energy-adjusted usual intakes and the HEL2005 scores are 
then obtained as in Table 1, using the estimated usual intakes of the dietary 
components. 
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To find the joint distribution of usual intakes of the HEI-2005 scores, it 
is convenient to use Monte Carlo methods. Recall that Wi is the sampling 
weight for individual i. Let -B be a large number: we set B = 5000. Generate 
b=l,. . . ,B observations Ubi. = Normal(0, S^) and then obtain T^j = (Tfoj^ )^£j^ 
by replacing Uij in their formulae by Ubij ■ With appropriate sample weight- 
ing, the Tbi can be used to estimate joint and marginal distributions. Thus, 
for example, consider the total HET-2005 score, which is a deterministic func- 
tion of the usual intakes, say, G{Ti). Its cumulative distribution function is 
estimated as 

^ooN ^f^^ _ Er=i Ef=i I{G{fu) < x}wi 
(3-») F[x)- . 

Frequentist standard errors of derived quantities such as mean, median and 
quantiles can be estimated using the Balanced Repeated Replication (BRR) 
method [Wolter (1995)]; see Section 5.7 for details. 

4. Comments on the approach to estimation. Our model (3.2) (3.4) is 
a highly nonlinear, mixed effects model with many latent variables and non- 
linear restrictions on the covariance matrix S^. As seen in Section 3.4, we 
can estimate relevant distributions of usual intake in the population if we 
can estimate S^j, Eg and /3j for j = 1,...,19. We have found that work- 
ing within a pseudo-likelihood Bayesian paradigm is a convenient way to do 
this computation. We emphasize, however, that we are doing this only to get 
frequentist parameter estimates based on the well-known asymptotic equiv- 
alence of frequentist likelihood estimators and Bayesian posterior means, 
and especially the consistency of both [Lehmann and Casella (1998)]. We 
are specifically not doing Bayesian posterior inference, since valid Bayesian 
inference in a complex survey such as NHANES is an immensely challenging 
task, and because frequentist estimation and inference are the standard in 
the nutrition community. 

Kipnis et al. (2009) were able to get estimates of parameters separately 
for each food group using the nonlinear mixed effects program NLMIXED in 
SAS with sampling weights. While this gives estimates of /3j for j = 1, . . . , 19, 
it only gives us parts of the covariance matrices E^j and E^, and not all 
the entries. Using the 2001-2004 NHANES data, we have verified that our 
estimates and the subset of the parameters that can be estimated by one food 
group at a time using NLMIXED are in close agreement, and that estimates 
of the distributions of usual intake and IIEI-2005 component scores are also 
in close agreement. We expect this because of the rather large sample size in 
our data set. Zhang et al. (2011) have shown that even considering a single 
food group plus energy is a challenge for the NLMIXED procedure, both 
in time and in convergence, and using this method for the entire IIEI-2005 
constellation of dietary components is impossible. 
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Full technical details of the model fitting procedure are given in Appen- 
dices A.l-A.lO. 

Of course, our model has assumptions, for example, additivity and ho- 
moscedasticity on a transformed scale for observed and latent variables, 
normality of person-specific random effects and normality of day-to-day vari- 
ability on the transformed scale. These assumptions are clearly not exactly 
correct, although our marginal model-checking suggests to us that they are 
mostly not disastrously wrong. Some reasons for this conclusion include the 
facts that we reproduce the marginal distributions of the components, that 
comparison with 24 h recalls shows differences that decrease when moving 
from one 24 h recall to two 24 h recalls, that q-q plots of the data are fairly 
satisfactory, etc. Thinking, as we do, of our work as a first step, and not 
a last step, it would be extremely interesting to make the model more gen- 
eral, for example, skew-normal, skew-i or Dirichlet process distributions after 
transformation, and possibly directly modeling heteroscedasticity. Such gen- 
eralizations will require effort to implement, but will speak to the robustness 
of the results and would be a useful future step. 

5. Empirical work. 

5.1. Basic analysis. We analyzed data from the 2001-2004 National 
Health and Nutrition Examination Survey (NHANES) for children ages 2-8. 
The study sample consisted of 2638 children, among whom 1103 children 
have two 24 h recalls and the rest have only one. We used the dietary intake 
data to calculate the 12 HEI-2005 components plus energy. In addition, be- 
sides age, gender, race and interaction terms, two covariates were employed, 
along with an intercept. The first was a dummy variable indicating whether 
or not the recall was for a weekend day (Friday, Saturday or Sunday) because 
food intakes are known to differ systematically on weekends and weekdays. 
The second was a dummy variable indicating whether the 24 h recall was 
the first or second such recall, the idea being that there may be systematic 
differences attributable to the repeated administration of the instrument. 

5.2. Contextual information. When we ran our program based on the 
variables in Table 1, the results were disastrous. Mixing of the MCMC sam- 
pler was very poor, with long sojourns in different regions. 

The reason for this failure to converge depends on the context of the di- 
etary variables. For example, whole grains are a subset of total grains. Thus, 
if someone consumes any whole grains, then necessarily, with probability 1.0, 
that person also consumes total grains. Such a restriction cannot be handled 
by our model, because it would force one of the random effects U to equal 
infinity. A similar thing happens for energy. Calories coming from saturated 
fat are a subset of total calories, as are calories from SoFAAS, so there is 
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a restriction that total calories must be greater than calories from saturated 
fat and also greater than calories from SoFAAS. Since the latter sum makes 
up a significant portion of calories, this restriction is not something that our 
model can handle well. 

Luckily, there is an easy and natural context-based solution. Instead of 
using total grains in the model, we used grains that are not whole grains, 
that is, refined grains, thus decoupling whole grains and total grains, and 
removing the restriction mentioned above. Similarly, instead of using total 
fruit, we use fruit that is not whole fruits, that is, fruit juices. Additionally, 
instead of using total vegetables, we use total vegetables excluding dark 
green and orange vegetables and legumes. Finally, instead of total energy, 
we use total energy minus the sum of energy from saturated fat (11% of 
mean energy) and from SoFAAS (35% of mean energy). We recognize that 
there is overlap of energy from saturated fat and energy from solid fat, but 
this has no impact on our analysis since total energy has sources other than 
these two. An alternative, of course, would have been to simply use total 
energy minus energy from SoFAAS, 

This is sufficient to estimate the distributions of interest. If, for example, 
in the new data set Tn represents usual intake of nonwhole fruits, and Ti2 is 
usual intake of whole fruits, then the usual intake of total fruits is Tn +T,2. 
Similar remarks apply for total grains and total vegetables. 

With these new variables, our model mixed well and gave reasonable 
looking answers that, as mentioned in Section 4, give similar results to other 
methods employed with smaller parts of the data set. 

5.3. Estimation of the HEI-2005 scores. In the Introduction we posed 4 
questions to which answers had not been possible previously. The first open 
question concerned the distribution of the HEI total score. Along the way 
toward this. Table 2 presents the energy- adjusted distributions of the dietary 
components used in the IIEI-2005. Table 3 presents the distributions of the 
IIEI-2005 individual component scores and the total score, with a graphical 
view given in Figure 1. 

Table 3 presents the first estimates of the distribution of HEI-2005 scores 
for a vulnerable subgroup of the population, namely, children aged 2-8 years. 
A previous analysis of 2003-2004 NHANES data, looking separately at 2-5 
year olds and 6-11 year olds, was limited to estimates of mean usual HEI- 
2005 scores [59.6 and 54.7, respectively; see Fungwe et al. (2009)]. The mean 
scores noted here are comparable to those and reinforce the notion that chil- 
dren's diets, on average, are far from ideal. However, this analysis provides 
a more complete picture of the state of US children's diets. By including 
the scores at various percentiles, we estimate that only 5% of children have 
a score of 69 or greater and another 10% have scores of 41 or lower. While 
not in the table, we also estimate that the 99th percentile is 74. This analysis 
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Table 2 
Estimated distributions of energy-adjusted usual intakes for children aged 2-8; 

NHANES, 2001-2004 





Units 


Mean 






Percentile 






Component 


5th 


10th 


25th 


50th 


75th 


90th 


95th 


Total fruit 


cups/ (1000 kcal) 


0.70 


0.14 


0.21 


0.37 


0.62 


0.95 


1.30 


1.54 






0.02 


0.02 


0.02 


0.02 


0.02 


0.03 


0.05 


0.07 


Whole fruit 


cups/ (1000 kcal) 


0.31 


0.04 


0.07 


0.14 


0.26 


0.42 


0.61 


0.73 






0.02 


0.01 


0.01 


0.02 


0.02 


0.03 


0.04 


0.06 


Total vegetables 


cups/ (1000 kcal) 


0.47 


0.23 


0.27 


0.36 


0.46 


0.58 


0.69 


0.77 






0.01 


0.02 


0.02 


0.02 


0.01 


0.02 


0.03 


0.03 


DOL 


cups/ (1000 kcal) 


0.05 


0.00 


0.01 


0.02 


0.03 


0.07 


0.11 


0.15 






0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


0.01 


0.01 


Total grains 


ounces/ (1000 kcal) 


3.32 


2.35 


2.54 


2.87 


3.28 


3.72 


4.16 


4.45 






0.05 


0.08 


0.07 


0.06 


0.05 


0.06 


0.08 


0.10 


Whole grains 


ounces/ (1000 kcal) 


0.27 


0.05 


0.07 


0.13 


0.23 


0.36 


0.52 


0.64 






0.01 


0.01 


0.01 


0.02 


0.01 


0.02 


0.03 


0.04 


Milk 


cups/(1000 kcal) 


0.97 


0.28 


0.38 


0.60 


0.90 


1.26 


1.64 


1.90 






0.02 


0.03 


0.03 


0.02 


0.02 


0.03 


0.05 


0.07 


Meat and beans 


ounces/ (1000 kcal) 


1.84 


1.06 


1.21 


1.48 


1.80 


2.16 


2.51 


2.73 






0.04 


0.09 


0.08 


0.06 


0.04 


0.04 


0.05 


0.07 


Oil 


grams/ (1000 kcal) 


7.13 


4.05 


4.60 


5.63 


6.93 


8.41 


9.90 


10.89 






0.23 


0.24 


0.21 


0.17 


0.20 


0.35 


0.54 


0.68 


Saturated fat 


% of energy 


11.71 


8.56 


9.20 


10.33 


11.64 


13.01 


14.32 


15.13 






0.15 


0.25 


0.20 


0.15 


0.15 


0.22 


0.32 


0.38 


Sodium 


grams/ (1000 kcal) 


1.49 


1.16 


1.23 


1.34 


1.48 


1.63 


1.77 


1.86 






0.01 


0.02 


0.02 


0.01 


0.01 


0.02 


0.03 


0.03 


SoFAAS 


% of energy 


36.93 


27.19 


29.28 


32.87 


36.90 


40.96 


44.61 


46.77 






0.48 


0.93 


0.81 


0.63 


0.48 


0.49 


0.64 


0.75 



For each dietary component, the first line = estimate from our model, while the second line 
is its BRR-estimated standard error. Here, "DOL" is dark green and orange vegetables 
and legumes. Also, "SoFAAS" is calories from solid fats, alcoholic beverages and added 
sugars. Total Fruit, Whole Fruit, Total Vegetables, DOL and Milk are in cups. Total 
Grains, Whole Grains and Meat and Beans are in ounces. Oil and Sodium are in grams. 
Saturated Fat and SoFAAS are in % of energy. Further discussion of the size of the BRR- 
estimated standard errors is given in the supplementary material [Zhang et al. (2011)]. 



suggests that virtually all children in the US have suboptimal diets and that 
a sizeable fraction (10%) have alarmingly low scores (41 or lower.) 

We have also considered whether our multivariate model fitting procedure 
gives reasonable marginal answers. To check this, we note that it is possible 
to use the SAS procedure NLMIXED separately for each component to fit 
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Table 3 
Estimated distributions of the usual intake HEI-2005 scores 





Mean 








Percentile 








Component 


5th 


10th 


25th 


50th 


75th 


90th 


95th 


Total fruit 


3.55 


0.87 


1.31 


2.33 


3.90 


5.00 


5.00 


5.00 




0.09 


0.13 


0.14 


0.15 


0.15 


0.00 


0.00 


0.00 


Whole fruit 


3.14 


0.49 


0.82 


1.71 


3.24 


5.00 


5.00 


5.00 




0.14 


0.12 


0.16 


0.21 


0.26 


0.03 


0.00 


0.00 


Total vegetables 


2.16 


1.02 


1.24 


1.63 


2.10 


2.62 


3.15 


3.48 




0.06 


0.10 


0.10 


0.07 


0.06 


0.07 


0.12 


0.16 


DOL 


0.62 


0.05 


0.09 


0.21 


0.45 


0.86 


1.38 


1.76 




0.04 


0.02 


0.03 


0.04 


0.05 


0.06 


0.08 


0.13 


Total grains 


4.81 


3.92 


4.23 


4.79 


5.00 


5.00 


5.00 


5.00 




0.03 


0.13 


0.12 


0.09 


0.00 


0.00 


0.00 


0.00 


Whole grains 


0.90 


0.16 


0.24 


0.43 


0.75 


1.21 


1.74 


2.13 




0.04 


0.04 


0.05 


0.05 


0.05 


0.05 


0.10 


0.14 


Milk 


6.77 


2.15 


2.96 


4.62 


6.91 


9.67 


10.00 


10.00 




0.12 


0.23 


0.22 


0.18 


0.17 


0.25 


0.00 


0.00 


Meat and beans 


7.22 


4.23 


4.83 


5.91 


7.21 


8.64 


10.00 


10.00 




0.16 


0.34 


0.30 


0.23 


0.17 


0.15 


0.11 


0.00 


Oil 


5.92 


3.37 


3.83 


4.69 


5.77 


7.01 


8.25 


9.07 




0.18 


0.20 


0.18 


0.14 


0.17 


0.29 


0.45 


0.57 


Saturated fat 


5.16 


0.00 


1.09 


3.18 


5.38 


7.48 


8.53 


8.96 




0.21 


0.35 


0.51 


0.35 


0.24 


0.23 


0.13 


0.16 


Sodium 


4.52 


1.25 


2.05 


3.31 


4.62 


5.83 


6.85 


7.44 




0.09 


0.30 


0.24 


0.15 


0.09 


0.11 


0.16 


0.19 


SoFAAS 


8.73 


2.15 


3.60 


6.02 


8.73 


11.42 


13.81 


15.21 




0.32 


0.50 


0.42 


0.33 


0.32 


0.42 


0.54 


0.62 


Total Score 


53.50 


37.42 


40.74 


46.73 


53.68 


60.36 


65.87 


68.96 




0.81 


1.45 


1.34 


1.09 


0.83 


0.82 


0.96 


1.08 



For each component score, the first line — estimate from our model, while the second 
line is its BRR-estimated standard error. The total score is the sum of the individual 
scores. Here, "DOL" is dark green and orange vegetables and legumes. Also, "SoFAAS" 
is calories from solid fats, alcoholic beverages and added sugars. Further discussion of the 
size of the BRR-estimated standard errors is given in the supplementary material [Zhang 
et al. (2011)]. 



a model with one episodically consumed food group or daily consumed di- 
etary component together with energy. The marginal distributions of each 
such component done separately are quite close to what we have reported 
in Table 3, as is our mean, which is 53.50 compared to the mean of 53.25 
based on analyzing one HEI-2005 component at a time with the NLMIXED 
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Fig. 1. The estimated percentiles of the HEI-2005 total score. The horizontal axis is the 
percentile of interest, for example, 0.5 refers to the median, while the vertical axis gives 
percentile of the HEI-2005 scores. Standard error estimates are given in Table 2. 

procedure. The only case where there is a mild discrepancy is in the esti- 
mated variability of the energy-adjusted usual intake of oils, likely caused 
by the NLMIXED procedure itself, which has an estimated variance 9 times 
greater than our estimated variance. 

Of course, it is the distribution of the HEI-2005 total score that cannot 
be estimated by analysis of one component at a time. 

There are other things that have not been computed previously that are 
simple by-products of our analysis. For example, the correlations among 
energy- adjusted usual intakes involving episodically consumed foods have 
not been estimated previously, but this is easy for us; see Table 4. The esti- 
mated correlation of —0.64 between energy- adjusted total fruit and energy- 
adjusted SoFAAS, and the —0.47 correlation between DOL and SoFAAS are 
surprisingly high. 



5.4. Component scores and other scores. As described in the Introduction, 
an open problem has been to estimate the correlation between the individual 
score on each dietary component and the scores of all other dietary compo- 
nents. In their Table 3, Guenther et al. (2008) consider this problem, but 
of course they did not have a model for usual energy adjusted intakes, and 
instead they used a single 24 h recall. In Table 5 we show the resulting cor- 
relations using (a) a single 24 h recall; (b) the mean of two 24 h recalls for 
those who have two 24 h recalls; and (c) our model for usual intake. The 
numbers for the former differ from that of Guenther et al. (2008) because we 
are considering here a different population than do they. A striking and not 
unexpected aspect of this table is that for those components with nontrivial 
correlations, the correlations all increase as one moves from a single 24 h 
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Table 4 
Estimated correlation matrix for energy-adjusted usual intakes 



Component 


TF WF TV 


DOL 


TG 


WG 


Milk 


Meat 


Oil 


SatFat Sodium SoFAAS 


TF 


1 0.76 0.07 


0.41- 


-0.10 


0.33 


0.16 


0.08- 


-0.35 


-0.38 


-0.25 


-0.64 


WF 


1 0.14 


0.49 


0.03 


0.35 


0.10 


0.05- 


-0.17 


-0.30 


-0.20 


-0.51 


TV 


1 


0.51- 


-0.25- 


-0.23- 


-0.09 


0.51- 


-0.08 


0.08 


0.42 


-0.16 


DOL 




1 


-0.08 


0.11 


0.14 


0.25- 


-0.06 


-0.23 


0.01 


-0.47 


TG 






1 


0.30- 


-0.30- 


-0.13 


0.44 


-0.36 


0.17 


-0.22 


WG 








1 


0.18- 


-0.18- 


-0.11 


-0.29 


-0.17 


-0.46 


Milk 










1 - 


-0.37- 


-0.21 


0.21 


-0.27 


-0.21 


Meat and beans 












1 - 


-0.06 


-0.08 


0.39 


-0.19 


Oil 














1 


-0.06 


0.11 


0.05 


SatFat 
















1 


0.09 


0.46 


Sodium 


















1 


0.04 


SoFAAS 




















1 



Here TF = Total Fruits, WF = Whole Fruits, TV = Total Vegetables, WG = Whole 
Grains, TG = Total Grains, SatFat — Saturated Fat. Here, "DOL" is dark green and 
orange vegetables and legumes. Also, "SoFAAS" is calories from solid fats, alcoholic bev- 
erages and added sugars. 

Table 5 

Estimated correlations between each individual HEI-2005 component score and the sum 

of the other HEI component scores, that is, the difference of the total score and each 

individual component 





First 24 h 


Two 24 h 


Model 


BRR s.e. 


Total fruit 


0.38 


0.44 


0.62 


0.05 


Whole fruit 


0.31 


0.37 


0.59 


0.10 


Total vegetables 


0.09 


0.11 


0.10 


0.11 


DOL 


0.18 


0.24 


0.41 


0.07 


Total grains 


0.00 


0.00 


0.06 


0.11 


Whole grains 


0.12 


0.16 


0.53 


0.08 


Milk 


-0.07 


-0.01 


0.01 


0.08 


Mean and beans 


-0.03 


-0.01 


-0.03 


0.15 


Oil 


0.08 


0.05 


-0.17 


0.08 


Saturated fat 


0.21 


0.23 


0.36 


0.06 


Sodium 


-0.03 


0.05 


0.07 


0.12 


SoFAAS 


0.52 


0.59 


0.72 


0.04 



The column labeled "Two 24 h" is the naive analysis that uses the mean of the two 24 h 
recalls, while the column labeled "First 24 h" is the naive analysis that uses the first 24 
h recall. The column labeled "Model" is our analysis, and the column labeled "BRR s.e." 
is the estimated standard error of our estimates. Here, "DOL" is dark green and orange 
vegetables and legumes. Also, "SoFAAS" is calories from solid fats, alcoholic beverages 
and added sugars. 
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recall to the mean of two 24 h recalls and then finally to estimated usual 
intake. Thus, for example, the correlation between the HEI-2005 score for 
total fruit and its difference with the total score is 0.38 for a single 24 h 
recall, 0.44 for the mean of two 24 h recalls and then finally 0.62 for usual 
intake. 

5.5. Distributions of intakes for subsets of HEI total scores. A third 
open question is as follows: among those whose total HEI-2005 score is 
>50 or <50, what is the distribution of energy-adjusted usual intake of 
whole grains, whole fruits, dark green and orange vegetables and legumes 
(DOL) and calories from solid fats, alcoholic beverages and added sugars 
(SoFAAS)? This follows naturally from our method. Following (3.8), let 
Gi{Tf,i) be energy adjusted usual intake and let G2{Tbi) be the HEI total 
score. Then the distributions in question for when the total HEI-2005^core 
is >50 can be estimated as F{x) = Y17=iYlb=i'^i^{GiiTbi) < x}I{G2{Tbi) > 
50}/Er=iEf=i«'.^{G2m,) >50}. 

The results are provided in Table 6, with a graphical view in Figure 2. 
The results show that those who have poorer diets with usual HEI-2005 total 
score < 50 are consistently eating poorer diets, that is, less whole fruits, less 
whole grains and less DOL, but higher SoFAAS. 

Table 6 
Estimated distributions of energy-adjusted usual intake for those whose total HEI-2005 

total scores are <50 and >50 







Mean 


s.d. 








Percentile 








Component 


5th 


10th 


25th 


50th 


75th 


90th 


95th 


Whole fruit 






















Total score 


<50 


0.15 


0.12 


0.02 


0.03 


0.07 


0.12 


0.21 


0.30 


0.38 


Total score 


>50 


0.39 


0.22 


0.11 


0.15 


0.23 


0.35 


0.51 


0.68 


0.80 


Whole grains 






















Total score 


<50 


0.18 


0.13 


0.03 


0.05 


0.09 


0.15 


0.25 


0.36 


0.44 


Total score 


>50 


0.32 


0.20 


0.07 


0.10 


0.17 


0.28 


0.42 


0.59 


0.70 


DOL 






















Total score 


<50 


0.02 


0.02 


0.00 


0.00 


0.01 


0.02 


0.03 


0.05 


0.07 


Total score 


>50 


0.06 


0.05 


0.01 


0.01 


0.03 


0.05 


0.09 


0.13 


0.17 


SoFAAS 






















Total score 


<50 


42.43 


3.97 


36.40 


37.59 


39.66 


42.16 


44.92 


47.67 


49.42 


Total score 


>50 


33.83 


4.44 


26.01 


27.89 


30.97 


34.15 


36.98 


39.28 


40.57 


Total Score 




53.50 


9.58 


37.42 


40.74 


46.73 


53.68 


60.36 


65.87 


68.96 



Here, "DOL" is dark green and orange vegetables and legumes. Also, "SoFAAS" is calories 
from solid fats, alcoholic beverages and added sugars. Units of measurement are given in 
Table 2. 
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Fig. 2. The estimated percentiles of the energy- adjusted usual intakes for Whole fruits 
(top left) in cups/ (1,000 kcal), Whole grains (top right) in ounces/ (1,000 kcal), DOL 
(bottom left) in cups/ (1,000 kcal) and calories from SoFAAS (bottom right) in % of 
Energy. The solid lines are for those whose usual HEI-2005 total score is <50, that is, 
poorer diets, while the dashed lines are for those whose usual HEI-2005 total score is > 50, 
that is, better diets. 



5.6. Dietary consistency. We stated in the Introduction that it is in- 
teresting to understand the percentage of children whose usual intake HEI 
score exceeds the median HEI score on all 12 HEI components. Those me- 
dian scores, say, (ki, . . . , ^12), are estimated in Table 3. If Gj{Tf,i) is the HEI 
component score for episodically consumed food j, then following (3.8) the 
quantity in question can be estimated as Y17=i'I2b=i'^iY[j=i'^{^ji'^bi) ^ 

Hj} / '^^=i'^h=i''^i- ^^ estimate that the percentage is 6%, woefully small. 
The percentage of children whose usual intake HEI score exceeds the median 
HEI score on all 12 HEI components is 0.24%. Figure 3 gives the estimated 
probabilities of exceeding the k percentile on all 12 HEI components simul- 
taneously, for K = 1, 2, . . . , 99. 
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100 



Fig. 3. The Y-axis gives the estimated probabilities of exceeding the k (X-axis) percentile 
on all 12 HEI components, for k = 1, 2, . . . , 99; see Section 5.6. 



5.7. Uncertainty quantification. The BRR standard errors of HEI-2005 
components' adjusted usual intakes and scores are shown in Tables 2 and 3. 
The BRR weights are only used in variance calculations. Once we have 
estimated some quantity, say, 0, from the sample using sample weight, we 
will need to compute the same quantity using, in succession, the 32 BRR 
weights. This will give us 32 estimates 9i,92, ■ ■ ■ , ^32- The BRR estimate for 
the variance of ^is (32 x 0.49)"^ Ep=i(^p - ^)^- The 32 in the denominator 
is for the 32 different estimates from the 32 different sets of weights, and 
the 0.49 is the square of the perturbation factor used to construct the BRR 
weight sets [Wolter (1995)]. 

6. Further discussion of the analysis. 

6.1. Never consumers. An aspect of the modeling that we have not dis- 
cussed is the possibility that some people never, ever consume an episodically 
consumed dietary component. Our model does not allow for this, for general 
reasons and for reasons that are specific to our data analysis. 

It is in principle possible to add an additional modeling step for non- 
consumers, via fixed effects probit regression, but we do not think this is 
a practical issue in our case, for two reasons: 

• The first is that the HEI-2005 is based on 6 episodically consumed dietary 
components, namely, total fruit, whole fruit, whole grains, total vegeta- 
bles, DOL and milk, the latter of which includes cheese, yogurt and soy 
beverages. None of these are "lifestyle adverse," unlike, say, alcohol. While 
40% of the responses for whole fruits, for example, equal zero, the per- 
centage of children who never eat any whole fruits at all is likely to be 
minuscule. 
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• Even if one disputes whether there are very few individuals who never 
consume one of the dietary components, then it necessarily follows that we 
have overestimated the HEI-2005 total scores, and, hence, the estimates of 
the proportion of individuals with alarmingly low HEI scores are deflated, 
and not inflated. The reason is that our model suggests everyone has 
a positive usual intake of the 6 episodically consumed dietary components. 
Since the HEI-2005 score components are nondecreasing functions of usual 
intake of the episodically consumed dietary components, this would mean 
that we overestimate the HEI-2005 total score. 

6.2. Computing and data. Our programs were written in Matlab. The 
programs, along with the NHANES data we used, are available in the An- 
nals of Applied Statistics online archive. Although a much smaller amount 
of computing effort yields similar results, using 70,000 MCMC steps with 
a burn-in of 20,000 takes approximately 10 hours on a Linux server. 

We also estimated the Monte Carlo standard error which is defined by 
Flegal, Haran and Jones (2008) as Ogj ^fn, where n is the total of iterations, 
and n = a6, where a is the number of blocks and h is the block size, and 
where 

^i = ^"' E ^(^^) for j = l,...,a. 

i=(j_i)b+i 

The batch means estimate of a^ is 

The ratio of the Monte Carlo standard error to the estimated standard 
deviation of the estimated parameters averages 3.4% for S„ and 1.7% for /3. 
Because of the public health importance of the problem, the National Can- 
cer Institute has contracted for the creation of a SAS program that performs 
our analysis. It will allow any number of episodically and daily consumed 
dietary components. The first draft of this program, written independently 
in a different programming language, gives almost identical results to what 
we have obtained, at least suggesting that our results are not the product 
of a programming error. 

7. Discussion. 

7.1. Transformations. In Appendix A. 12 we describe how we estimated 
the transformation parameters as a separate component-wise calculation. 
We have done some analyses where we simultaneously transform each com- 
ponent, and found very little difference with our results. However, the comp- 
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uting time to implement this is extremely high, because of the fact that dif- 
ferent transformations make data on different scales, so we have to compute 
the usual intakes at each step in the MCMC, and not just at the end. 

7.2. What have we learned that is new. There are many important ques- 
tions in dietary assessment that have not been able to be answered because of 
a lack of multivariate models for complex, zero-inflated data with measure- 
ment errors and a lack of ability to fit such multivariate models. Nutrients 
and foods are not consumed in isolation, but rather as part of a broader pat- 
tern of eating. There is reason to believe that these various dietary compo- 
nents interact with one another in their effect on health, sometimes working 
synergistically and sometimes in opposition. Nonetheless, simply character- 
izing various patterns of eating has presented enormous statistical challenge. 
Until now, descriptive statistics on the HEI-2005 have been limited to exam- 
ination of either the total scores or only a single energy-adjusted component 
at a time. This has precluded characterization of various patterns of dietary 
quality as well as any subsequent analyses of how such patterns might relate 
to health. 

This methodology presented in this paper presents a workable solution 
to these problems which has already proven valuable. In May 2010, just as 
we were submitting the paper, a White House Task Force on Childhood 
Obesity created a report. They had wanted to set a goal of all children 
having a total HEI score of 80 or more by 2030, but when they learned we 
estimated only 10% of the children ages 2-8 had a score of 66 or higher, they 
decided to set a more realistic target. The facility to estimate distributions of 
the multiple component scores simultaneously will be important in tracking 
progress toward that goal. 

7.3. In what other arenas will our work have impact? There are many 
other important problems where multivariate models such as ours will be 
important. One such problem arises when studying the relationship between 
multiple dietary components or dietary patterns and health outcomes. Tra- 
ditionally, for cost reasons, large cohort studies have used a food frequency 
questionnaire (FFQ) to measure dietary intake, sometimes with a small cal- 
ibration study including short-term measures such as 24 h recalls. However, 
there is a new web-based instrument called the Automated Self-administered 
24-hour Dietary Recall (ASA24'^^) (see http://riskfactor.cancer.gov/ 
tools/instruments/asa24), which has been proposed to replace or at least 
supplement the FFQ and which is currently undergoing extensive testing. 
The dietary data we will see then is what we have called l^jfc, that is, 24 h 
recall data. In order to correct relative risk estimates for the measurement 
error inherent in the ASA24 , regression calibration [Carroll et al. (2006)] 
will almost certainly be the method of choice, as it is in most of nutritional 
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epidemiology. This method attempts to produce an estimate of the regres- 
sion of usual intake on the observed intakes, and then to use these estimates 
in Cox and logistic regression for the health outcome. In order to perform 
this regression, a multivariate measurement error model will be required, 
since the regression is on all the observed dietary intake components in the 
regression model measured by the ASA24"'"'^, and not on each individual 
component. Our methodology is easily extended to address this problem. 

APPENDIX: DETAILS OF THE FITTING PROCEDURE 
In this Appendix we give the full details of the model fitting procedure. 

A.l. Notational convention. In our example, age was standardized to 
have mean 0.0 and variance 1.0, to improve numerical stability. 

As described in Section 3.1, the observed, transformed nonzero 24 h re- 
calls were standardized to have mean 0.0 and variance 2.0. More precisely, for 
^ = 1,2,. ..,6, we first transformed the nonzero food group data as Zi^2£,k = 
9{Yi,2e,k,^e), and then we standardized these data as Qi,2e,k = V^{Zi^2e,k — 
fj,{Xi)}/a{\i), where {iJ,{Xi),a{Xi)} are the mean and standard deviation of 
the nonzero food intakes Zi^2£,k- Similarly, for nonepisodically consumed di- 
etary components and energy we transformed to Zifi^i^j^ = 5'(l^j,6+£,fe; '^^) for 
i = 7,. . . ,13, and then standardized to Qi^+i^k = V^{Zifi+e,k — ^J'{^e)}/o'{Xe)■ 
Oi course, whether the food group is consumed or not is Qi^2i-i,k = Yi,2i~i,k 
for ^ = 1, ... ,6. Collected, the data are Qik = {Qijk)]=i- The terms {^(A^), 
a{Xi)} are not random variables but are merely constants used for standard- 
ization, and we need not consider inference for them. Back-transformation 
is discussed in Appendix A.ll. 

A.2. Prior distributions. Because the data were standardized, we used 
the following conventions: 

• The prior for all /3j were normal with mean zero and variance 100. 

• The prior for S„ was exchangeable with diagonal entries all equal to 1.0 
and correlations all equal to 0.50. There were 21 degrees of freedom in the 
inverse Wishart prior, that is, niu = 21. Thus, the prior is IW{{mu — 19 — 
1)5^M, prior; "T-n}- We experimented with this prior by using zero correlation, 
and the results were essentially unchanged. 

• The prior for r^ is Uniform[— 1, 1]. Set the initial value: r^ = 0, A; = 1, . . . , 5. 

• The prior for 9k is Uniform[— 7r,7r]. Set the initial value: 9k = 0, k = 
1,...,25. 

• The priors for ?;22; ^44, ■ • ■ , ^12,12 and 1)13,13, . . . , "Wig^ig were Uniform[— 3, 3]. 
Set the initial values: ^22 = ^44 = • • • = f 12,12 = '^13, 13 = • • • = ^'19,19 = 1. 
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• For the rest of the nondiagonal Vij''s which could not be determined by 
the restrictions, we used Uniforni[— 3,3] priors. Set the initial values to 
beO. 

The constraints on S^ are nonlinear, and our parameterization enforces 
them easily without having to have prior distributions for the original pa- 
rameterization that satisfy the nonlinear constraints. 

The key thing that makes things work well with the other components 
of the matrix V with S^ = VV is that we have standardized the data as 
described in Appendix A.l. With this standardization, things become much 
nicer. For example, the variance of the e's for energy is X^,=i ^19 j- However, 
since the sample variance for energy is standardized to equal 2.0, we simply 
just need to make priors for v igj be uniform on a modest range to have real 
flexibility. 

A.3. Generating starting values for the latent variables. While we ob- 
serve Qik, in the MCMC we need to generate starting values for the latent 
variables Wik = {Wijk)]ti to initiate the MCMC: 

• For nutrients and energy, Qijk = Wijk, no data need be generated, j = 
13,..., 19. 

• For the amounts, Qi2k, Qi4k, Qi6k, QiSk, Qi,w,k and (5i,i2,fc, we set Wi2k = 

Qi2k, Wi4k = Qi4k, WiQk = QiGk, Wigk = QiSk, ^i,10,fc = Qi,lO,k and VFi,i2,fc = 
Qi,12,k- _ 

• For consumption, we generate Ui as normally distributed with mean zero 
and covariance matrix given as the prior covariance matrix for S„ . For £ = 
1, . . . , 6, we also compute Zik = |-'^i^2£-i,fc/32£-i,prior + f^i,2f-i + ^ikl, where 
Zik = Normal(0, 1) are generated independently. We then set Wi^2£-i,k = 

ZikQi,2l-l,k — Zikil — Qi,2l-l,k)- 

• Finally, we then updated Wik by a single application of the updates given 
in Appendix A. 9. 

A.4. Complete data loglikelihood. Let J = 19. The complete data in- 
clude the indicators of whether a food was consumed, the W variables and 
the random effect U variables. The loglikelihood of the complete data is 

6 n mi 

i=l i=l k=l 

+ (1 - Qi,2e-i,k)I{Wi,2e-i,k < 0)} 



+ lY,yJ^/2 log(|S-i|) - ii/2)Y,mu^^-'Ui 
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J 

~ y / ' / j \Pj ~ Pj, priori ^^/3J\Pj ~ Pj,pnor) 

+ {(m„ + J + l)/2} log(|S-i|) - {(m„ - J - l)/2} trace(S„,priorS-i) 

n 5 

- (1/2) XI ■^^"^^^°S{(^^i2 ^44^66 ■^i8^10,10'^12,12«13,13 " --vh) JJi^ " '^g)} 
i=l q=l 

n mi 

- {1/2) Y^mY^mk - {xikPi, . . .,xfjj,pj? - f/a^s-i 
j=i fc=i 

We used Gibbs sampling to update this complete data loglikelihood, the 
details for which are given in subsequent appendices. The weights Wi are 
integers and are used here in a pseudo-likelihood fashion. One can also 
think of this as expanding each individual into Wi individuals, each with 
the same observed data but different latent variables. For computational 
convenience, since we are only asking for a frequentist estimator and not do- 
ing full Bayesian inference, the latent variables in the process are generated 
once for each individual. Estimates of S„, S^ and (3j for j = 1, . . . , J were 
computed as the means from the Gibbs samples. Once again, we emphasize 
that we are not doing a proper Bayesian analysis, but only using MCMC 
techniques to obtain a frequentist estimate, with uncertainty assessed using 
the frequentist BRR method. 

A. 5. Complete conditionals for r^, Oq and Vpq. Except for irrelevant 
constants, the complete conditional for r^ (q = 1, . . . , 5) is 



log[rg|rest] = --Xu;imilog(l - r^) 

1 n rrii 



2 

4 = 1 fc=l 



Except for irrelevant constants, the complete conditionals for Vqq {q = 
2, 4, 6, 8, 10, 12, 13,..., 19) are 



1 " 

log^glrest] = --Xwimilog(Ugg) 



2 

i=l 
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i=l k=l 



Except for irrelevant constants, the compete conditionals for 9q [q = 
1, ... ,25) and nondiagonal free parameters Vpq are 

log[x|rest] = -- ^ ^, ^{W'.fc - (xT,/3i, . . . , X^ g ,/3i9)T _ [/.}T 

i=l fc=l 

The full conditionals do not have an explicit form, so we use a Metropolis- 
Hastings within a Gibbs sampler to generate it: 

• '''q (9 = li • • • ) 5). We discretize the values of r^ to the set {—0.99 + 2 x 
0.99(j - 1)/(M - 1)}, where j = 1, . . . ,M and we choose M = 41. 

Proposal: The current value is Vg^f The proposed value of Tg^t+i is 
selected randomly from the current value and the two nearest neighbors 
of rq^f Then Tg^t+i is accepted with probability mm{l, g{rq^t+i) / g{rq,t)} , 
where 

X exp 



m.i 



4^19,fc/5l9)^-?7aTs-l(.) 



i=l fc=l 

where here and in what follows, for any A, j4^S~^(») = A^Ti'^^A. 
9q ((7 = 1,..., 25). We discretize similarly as above. 

Proposal: The current value is Oq^f The proposed value 0g,f+i is selected 
randomly from the current value and the two nearest neighbors of Oq^f 
Then Oq^t+i is accepted with probability m.\i\{\,g{9q^t+i)/g{(^q,t)}-, where 



g{y) oc exp 



^ n rrii ^ 



i=l k=l 



Jqq 



= 2, 4, 6, 8, 10, 12, 13, ... , 19). Proposal: The current value is Vqq^f 
A candidate Vqq^t+i is generated from the Uniform distribution of length 
0.4 with mean Vqq^f The candidate value Vqq^t+i is accepted with proba- 
bility v[im{l,g{vqq^t+i)/g{vqq,t)}, where 



X exp 



-. n rrii 



i=l fc=l 
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• Nondiagonal free parameters Vpq. Proposal: The current value is Vpq^f 
The candidate value Vpq^t+i is generated from the Uniform distribution of 
length 0.4 with mean Vpq^f The candidate value is accepted with proba- 
bility mm{l, g{vpq^t+i)/9{vpq,t)}, where 



g{y) oc exp 



n m. 



1 "^ " "i 



2 

i=l fc=l 



A. 6. Complete conditionals for 'S^- The dimension of the covariance 
matrices is J = 19. By inspection, the complete conditional for S^ is 

[i;„|rest] =IW< {mu- J - l)T,u,pnor + '^WiUiUj^ ,n + niu >, 

where here IW = the Inverse- Wishart distribution. The density of IW(il, m) 
for a J X J random variable is 

IW(0,m) = /(g|0,m)oc|Qr('"+'^+^)/2gxp{-itrace(!^Q-^)}. 

This has expectation Vt/{m — J — 1). 

A. 7. Complete conditionals for (3. Let the elements of S~^ be ai . For 
any j, except for irrelevant constants, 

log[/3j|rest] = -l^{Pj - l3j,pviorV^^j{/3j - (3j,pnor) 






2 

i=l fe=l 
n nii 

i=i fc=i e^j 
= Ci /3j — -(3j ^2 /3j, 
which implies [/3j|rest] = Normal (C2Ci,C2), where 

/ n rrii \ ^1 

^2 = ^tj + E ""^"e E ^^jkXit, ; 



4 = 1 fc = l 

n rrii 



Ci = ^ i3jl3 j^priov + E "^^ E <^i^Xijk{Wijk - Ui- 



i=l k=l 

n mi 



+ E ^^ E E ^i'iw^ik - Xlkdi - Uu)X,,u- 
i=l fe=l ii^j 
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A.8. Complete conditionals for Ui. The NHANES 2001-2004 weights 
are integers, representing the number of children that each sampled child 
represents. Thus, as described therein, the loglikelihood in Section A. 4 could 
also be rewritten equivalently by developing Wi pseudo-children, each with 
the same observed data values. li^thus does not make sense to use the 
weights to generate an individual Ui. Instead, a£ described in Section A. 4, 
for computational convenience for generating a Ui to represent Wi children, 
we set the weight for that child temporarily = 1.0. Then, except for irrelevant 
constants. 



log[Ui\rest] = -^WiU:^j:Z^U 



2 

fc=i 

= cju-^u^c^'u. 

RememberingJ;hat for purposes of this section we are setting Wi = 1.0, this 
implies that [C/j|rest] = Normal(C2Ci,C2)5 where 

rrii 

Ci = Y,^;'{Wik - {XlkPi, . . . ,XT g fc/3i9)T}. 
fc=l 

A.9. Complete conditional for Wnk^ ^ = 1,3,5,7,9,11. Here we do 

the complete conditional for Wnk with i = 1,3,5,7,9,11. Except for irrele- 
vant constants, 

log[Wi^fc|rest] = log{QiekI{Wiek > 0) + (1 - Qiik)I{Wiek < 0)} 

- ^WiiWiik - Xlf.f3i -Uii,..., Wi,i9,fc - Xj^^Q,.f3ig - Ui^ia) 

= logiQiekHWiek > 0) + (1 - Qak)IiWiek < 0)} 

- ^w^afiW^ik - XlkPi - Uef 



-w,Y^ ai^Wuk - Xl^fii - Ua)m,k - Xj^^p, - U,] 

\og{QiekIiWak > 0) + (1 - Q^ik)I{Wiek < 0)} + CiWuk 
-\wf,kC2\ 
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where, using the convention of Appendix A. 8, 
C2 = l/(af ) 

3+1- 

If we use the notation TN_|_(//,(T, c) for a normal random variable with 
mean \i and standard deviation a that is truncated from the left at c, and 
similarly use TN_(//,o", c) when truncation is from the right at c, then it 

1/2 

follows that with /i = C2C1 and a = C2 , 

[Wiek\T^est] = QiekTN+{fi,a,0) + (1 - Qak)TN_{fi,a,0) 

= fi + QiekTN+ (0, a, -/i) + (1 - Q,a.)TN_ (0, a, -/i) 

= fi + Qi£fcTN+ (0, a, -/i) - (1 - Q^ek)TN+ (0, a, fi) 

= fi + a{QiekTN+{0, 1, -fi/a) - (1 - Q,a)TN+(0, 1, /i/a)}. 

Generating TN+(0,l,c) is easy: if c < 0, simply do rejection sampling of 
a Normal(0, 1) until you get one that is > c. If c > 0, there is an adaptive 
rejection scheme [Robert (1995)]. 

A.IO. Complete conditionals for Wi2k, Wi4k, Wjefc, Wisk, Wi,io,fc and 
^i,i2,fc when not observed. For p = 2,4, 6,8, 10, 12, the variable Wipk is 
not observed when Qi^p-i^k = 0, or, equivalently, when Wi^p-i^k < 0. Except 
for irrelevant constants. 



j i 
1 
'2 



-WlkC:,^+C^W,pk, 



where, using the convention of Appendix A. 8, 

C2 = l/(af); 

Ci = a^iXl^Pp + U,p) -Y^af{Wi,k - X^^H, - U^). 

Therefore, 

[T4^jpfc|rest] = QipkQi,p-i,k + (1 - Qi,p-i,fc)Normal(C2Ci,C2). 



A NEW MULTIVARIATE MODEL FOR DIETARY DATA 31 

A. 11. Usual intake, standardization and transformation. Here we present 
detailed formulas for functions defined in Section 3.4. When A = 0, the back- 
transformation is 



9t: 



■.\z, 0) = exp{ii{0) + a{0)z/V2}; 



When A 7^ 0, the back-transformation is 

g^,Hz, A) = [1 + A{ax(A) + a(A)z/^/2}]^/^ 

d'g-\z, X)/dz' = ^(1 - A)[l + A{/x(A) + a{X)z/V2}r'+'/\ 

A. 12. Transformation estimation. As part of an earlier project [Freed- 
man et al. (2010)], we estimated the transformations for one food/nutrient 
at a time using the method of Kipnis et al. (2009), both for the data and 
also for each BRR weighted data set. To facilitate comparison with the one 
food/nutrient at a time analysis, in our analysis of all HEI-2005 components, 
we used these transformations as well. Of course, our methods can be gen- 
eralized to allow for estimation of the transformations as well. By allowing 
a different transformation for each BRR weighted data set, we have captured 
the variation due to estimation of the transformations. 

SUPPLEMENTARY MATERIAL 
Supplement A: Additional tables (DOl: 10.1214/10-AOAS446SUPPA). 

Supplement B: Data files of the NHANES data used in the analysis (DOl: 

10.1214/10-AOAS446SUPPB; .zip). 

Supplement C: Matlab programs for the data analysis 

(DOl: 10.1214/10-AOAS446SUPPC; .zip). 

REFERENCES 

BuONACCORSi, J. P. (2010). Measurement Error: Models, Methods, and Applications. 

CRC Press, Boca Raton, FL. MR2682774 
Carriquiry, a. L. (1999). Assessing the prevalence of nutrient inadequacy. Public Health 

Nutrition 2 23-33. 
Carriquiry, A. L. (2003). Estimation of usual intake distributions of nutrients and foods. 

Journal of Nutrition 133 601-608. 
Carroll, R. J., Ruppert, D., Stefanski, L. A. and Crainiceanu, C. M. (2006). 

Measurement Error in Nonlinear Models, 2nd ed. Monographs on Statistics and Applied 

Probability 105. Chapman & HaU/CRC, Boca Raton, FL. MR2243417 
Delaigle, a. (2008). An alternative view of the deconvolution problem. Statist. Sinica 

18 1025-1045. MR2440402 



32 S. ZHANG ET AL. 

Delaigle, a. and Hall, P. (2008). Using SIMEX for smoothing-parameter choice in 
errors-in-variables problems. J. Amer. Statist. Assoc. 103 280-287. MR2394636 

Delaigle, A., Hall, P. and Meister, A. (2008). On deconvolution with repeated mea- 
surements. Ann. Statist. 36 665-685. MR2396811 

Delaigle, A. and Hall, P. (2011). Estimation of observation-error variance in errors- 
in-variables regression. Statist. Simca. To appear. 

Delaigle, A. and Meister, A. (2008). Density estimation with heteroscedastic error. 
Bernoulli 14 562-579. MR2544102 

Ferrari, P., Roddam, A., Fahey, M. T., Jenab, M., Bamia, C, Ocke, M., Ami- 
ANO, P., Hjartaker, a., Biessy, C, Rinaldi, S., Huybrechts, I., T,i0nneland, a., 
Dethlefsen, C, Niravong, M., Clavel-Chapelon, F., Linseisen, J., Boeing, H., 
OiKONOMOU, E., Orfanos, p., Palli, D., Santucci de Magistris, M., Bueno-de 
Mesquita, H. B., Peeters, P. H., Parr, C. L., Braaten, T., Dorronsoro, M., 
Berenguer, T., Gullberg, B., Johansson, I., Welch, A. A., Riboli, E., Bing- 
ham, S. and Slimani, N. (2009). A bivariate measurement error model for nitrogen 
and potassium intakes to evaluate the performance of regression calibration in the Eu- 
ropean Prospective Investigation into Cancer and Nutrition study. European Journal 
of Clinical Nutrition 63 Supplement 4 S179-S187. 

Flegal, J. M., Haran, M. and Jones, G. L. (2008). Markov chain Monte Carlo: Can 
we trust the third significant figure? Statist. Set. 23 250-260. MR2516823 

Eraser, G. E. and Shavlik, D. J. (2004). Correlations between estimated and true 
dietary intakes. Ann. Epidemiol. 14 287-295. 

Freedman, L. S., Guenther, P. M., Krebs-Smith, S. M., Dodd, K. W. and 
MiDTHUNE, D. (2010). A population's distribution of Healthy Eating Index-2005 com- 
ponent scores can be estimated when more than one 24-hour recall is available. J. Nutr. 
140 1529-1534. 

Fuller, W. A. (1987). Measurement Error Models. Wiley, New York. MR0898653 

FuNGWE, T., Guenther, P. M., Juan, W. Y., Hiza, H. and Lino, M. (2009). The 
quality of children's diets in 2003-04 as measured by the Healthy Eating Index-2005. 
In Nutrition Insight 43. USDA Center for Nutrition Policy and Promotion. 

Guenther, P. M., Reedy, J. and Krebs-Smith, S. M. (2008). Development of the Heal- 
thy Eating Index-2005. Journal of the American Dietetic Association 108 1896-1901. 

Guenther, P. M., Reedy, J., Krebs-Smith, S. M. and Reeve, B. B. (2008). Evaluation 
of the Healthy Eating Index-2005. Journal of the American Dietetic Association 108 
1854-1864. 

GuOLO, A. (2008). A flexible approach to measurement error correction in casecontrol 
studies. Biometrics 64 1207-1214. 

Gustafson, p. (2004). Measurement Error and Misclassification in Statistics and Epi- 
demiology: Impacts and Bayesian Adjustments. Chapman & Hall/CRC, Boca Raton, 
FL. MR2005104 

KiPNis, v., Midthune, D., Buckman, D. W., Dodd, K. W., Guenther, P. M., Krebs- 
Smith, S. M., Subar, a. F., Tooze, J. A., Carroll, R. J. and Freedman, L. S. 
(2009). Modeling data with excess zeros and measurement error: Application to evaluat- 
ing relationships between episodically consumed foods and health outcomes. Biometrics 
65 1003-1010. 

KiPNis, v., Freedman, L. S., Carroll, R. J. and Midthune, D. (2011). A measurement 
error model for episodically consumed foods and energy. Preprint. 

KoTT, P. S., Guenther, P. M., Wagstaff, D. A., Juan, W. Y. and Kranz, S. (2009). 
Fitting a linear model to survey data when the long-term average daily intake of a di- 
etary component is an explanatory variable. Survey Research Methods 3 157-165. 



A NEW MULTIVARIATE MODEL FOR DIETARY DATA 33 

KiJCHENHOFF, H., MwALlLl, S. M. and Lesaffre, E. (2006). A general method for 
dealing with misclassification in regression: The misclassification SIMEX. Biometrics 
62 85-96, 315-316. MR2226560 

Lehmann, E. L. and Casella, G. (1998). Theory of Point Estimation, 2nd ed. Springer, 
New York. MR1639875 

Liang, H., Thurston, S. W., Ruppert, D., Apanasovich, T. and Hauser, R. 
(2008). Additive partial linear models with measurement errors. Biometnka 95 667- 
678. MR2443182 

Messer, K. and Natarajan, L. (2008). Maximum likelihood, multiple imputation and 
regression calibration for measurement error adjustment. Stat. Med. 27 6332-6350. 

Natarajan, L. (2009). Regression calibration for dichotomized mismeasured predictors. 
Int. J. Biostat. 5 Art. 1143, 27. MR2504959 

NusSER, S. M., Fuller, W. A. and Guenther, P. M. (1997). Estimating usual dietary 
intake distributions: Adjusting for measurement error and nonnormality in 24-hour 
food intake data. In Survey Measurement and Process Quality (L. Lyberg, P. Biemer, 
M. Collins, E. Deleeuw, C. Dippo, N. Schwartz and D. Trewin, eds.) 670-689. 
Wiley, New York. 

Nusser, S. M., Carriquiry, A. L., Dodd, K. W. and Fuller, W. A. (1996). A semi- 
parametric approach to estimating usual intake distributions. J. Amer. Statist. Assoc. 
91 1440-1449. 

Prentice, R. L. (1996). Measurement error and results from analytic epidemiology: Di- 
etary fat and breast cancer. J. Natl. Cancer Inst. 88 1738-1747. 

Prentice, R. L. (2003). Dietary assessment and the reliability of nutritional epidemiology 
reports. Lancet 362 182-183. 

Robert, C. P. (1995). Simulation of truncated normal variables. Statistics and Computing 
5 121-125. 

Staudenmayer, J., Ruppert, D. and Buonaccorsi, J. P. (2008). Density estimation 
in the presence of heteroscedastic measurement error. J. Amer. Statist. Assoc. 103 726- 
736. MR2524005 

TOOZE, ,]. A., Grunwald, G. K. and Jones, R. H. (2002). Analysis of repeated measures 
data with clumping at zero. Stat. Methods Med. Res. 11 341-355. 

ToozE, J. A., Midthune, D., Dodd, K. W., Freedman, L. S., Krebs-Smith, S. M., 
SuBAR, A. F., Guenther, P. M., Carroll, R. ,J. and Kipnis, V. (2006). A new 
statistical method for estimating the usual intake of episodically consumed foods with 
application to their distribution. J. Am. Diet. Assoc. 106 1575-1587. 

Wand, M. P. (1998). Finite sample performance of deconvolving density estimators. 
Statist. Probab. Lett. 37 131-139. MR1620450 

Wolter, K. M. (1995). Introduction to Variance Estimation. Springer, New York. 

Zhang, S., Midthune, D., Perez, A., Buckman, D. W., Kipnis, V., Freedman, L. S., 
Dodd, K. W., Krebs-Smith, S. M. and Carroll, R. J. (2011). Fitting a bivariate 
measurement error model for episodically consumed dietary components. International 
Journal of Biostatistics 7 (1) Article 1. 

Zhang, S., Midthune, D., Guenther, P. M., Krebs-Smith, S. M., Kipnis, V., 
Dodd, K. W., Buckman, D. W., Tooze, J. A., Freedman, L. S. and 
Carroll, R. J. (2011). Supplement to "A new multivariate measurement er- 
ror model with zero-inflated dietary data, and its application to dietary assess- 
ment." DOI: 10.1214/10-AOAS446SUPPA, DOI: 10.1214/10-AOAS446SUPPB, DOI: 
10.1214/10-AOAS446SUPPC. 



34 



S. ZHANG ET AL. 



S. Zhang 

Merck k, Co., Inc. 

126 E. Lincoln Ave. 

PO Box 2000 (RY34-A316) 

Rahway, New Jersey 07065 

USA 

E-MAIL: saijuan.zhang@merck.coin 



p. m. guenther 

Center for Nutrition Policy and Promotion 

U.S. Department of Agriculture 

3101 Park Center Drive, Ste. 1034 

Alexandria, Virginia 22302 

USA 

E-MAIL: Patricia.Guenther@cnpp.usda.gov 



d. buckman 

Information Management Services, Inc. 

12501 Prosperity Drive 

Silver Spring, Maryland 20904 

USA 

E-MAIL: BuckmanD@imswcb.com 



L. Freedman 

Gertner Institute for Epidemiology 

and Health Policy Research 
Sheba Medical Center 
Tel Hashomer 52161 
Israel 
E-MAIL: lsf@actcom.co.iI 



D. MiDTHUNE 

V. KiPSNis 

K. DODD 

Biometry Research Group 

Division of Cancer Prevention 

National Cancer Institute 

6130 Executive Boulevard EPN-3131 

Bethesda, Maryland 20892-7354 

USA 

E-MAIL: niidthund@mail.nih.gov 

kipnisv@mail.nih.gov 

doddk@mail.nih.gov 

S. Krebs-Smith 

Applied Research Program 

Division of Cancer Control 

and Population Sciences 
National Cancer Institute 
6130 Executive Boulevard, EPN-4005 
Bethesda, Maryland 20892 
USA 
E-MAIL: krebssms@mail.nih.gov 

J. TOOZE 

Department of Biostatistical Sciences 

Wake Forest University, School 

OF Medicine 
Medical Center Boulevard 
Winston-Salem, North Carolina 27157 
USA 
E-MAIL: jtooze@wfubmc.edu 

R. J. Carroll 

Department of Statistics 

Texas A&M University 

3143 TAMU 

College Station, Texas 77843-3143 

USA 

E-MAIL: carroll@stat.tamu.edu 



